最近,基于骨架的动作识别已经取得了快速进步和卓越的性能。在本文中,我们在跨数据集设置下调查了这个问题,这是现实情况下的新,务实且具有挑战性的任务。遵循无监督的域适应(UDA)范式,该动作标签仅在源数据集上可用,但在训练阶段的目标数据集中无法使用。与UDA的常规基于对抗性学习的方法不同,我们利用一个自学计划来减少两个基于骨架的动作数据集之间的域移动。我们的灵感来自Compism,Compism是20世纪初期的艺术类型,它破坏并重新组装了物体以传达更大的背景。通过分割和定制时间段或人体部位,我们设计了两个自制的学习分类任务,以探索基于骨架的动作的时间和空间依赖性,并提高模型的概括能力。我们在六个基于骨架的动作识别的数据集上进行实验,包括三个大规模数据集(NTU RGB+D,PKU-MMD和动力学),在其中建立了新的跨数据库设置和基准。广泛的结果表明,我们的方法优于最先进的方法。我们的模型和所有比较方法的源代码均可在https://github.com/shanice-l/st-cubism上获得。
translated by 谷歌翻译
基于细粒的草图的图像检索(FG-SBIR)旨在找到来自给定查询草图的大型画廊的特定图像。尽管FG-SBIR在许多关键域中进行了广泛适用性(例如,犯罪活动跟踪),但现有的方法仍然遭受低精度,同时对外部噪声敏感,例如草图中不必要的笔画。在更实际的在飞行环境下,检索性能将进一步恶化,其中仅具有少数(噪声)笔划的部分完整的草图可用于检索相应的图像。我们提出了一种新颖的框架,利用了一个独特设计的深度加强学习模型,该模型执行双层探索,以处理部分素描训练和注意区域选择。通过对模型的注意力对原始草图的重要地区实施,对不必要的行程噪声仍然坚固,并通过大边距提高检索准确性。为了充分探索部分草图并找到要参加的重要区域,该模型在调整控制本地探索的定位器网络的标准偏差项时,该模型对全局探索进行引导策略梯度。培训过程是由混合损失引导的,融合了强化损失和监督损失。开发了一种动态排名奖励,以使用部分草图来适应随机图像检索过程。在三个公共数据集上执行的广泛实验表明,我们的建议方法在部分草图基于图像检索上实现了最先进的性能。
translated by 谷歌翻译
如今,大规模数据集的大型培训大型模型已成为深度学习的关键主题。具有较高表示能力和可传递性的预训练模型取得了巨大的成功,并在自然语言处理和2D视觉中占据了许多下游任务。但是,鉴于有限的训练数据相对不便,因此将这种预处理的调整范式促进这种预处理的调整范式是非平凡的。在本文中,我们提供了一个新的观点,即利用3D域中的预训练的2D知识来解决此问题,以新颖的点对像素来调整预训练的图像模型,以较小的参数成本提示点云分析。遵循促使工程的原理,我们将点云转换为具有几何形状的投影和几何学吸引着色的色彩图像,以适应预训练的图像模型,在点云分析的端到端优化期间,其权重冻结了任务。我们进行了广泛的实验,以证明与提议的点对像素提示合作,更好的预训练图像模型将导致在3D视觉中始终如一地表现更好的性能。享受图像预训练领域的繁荣发展,我们的方法在Scanobjectnn的最困难环境中获得了89.3%的精度,超过了传统的点云模型,具有较少的可训练参数。我们的框架在模型网分类和塑形部分分割方面还表现出非常具竞争力的性能。代码可从https://github.com/wangzy22/p2p获得
translated by 谷歌翻译
我们呈现Point-Bert,一种用于学习变压器的新范式,以概括BERT对3D点云的概念。灵感来自BERT,我们将屏蔽点建模(MPM)任务设计为预列火车点云变压器。具体地,我们首先将点云划分为几个本地点修补程序,并且具有离散变化性AutoEncoder(DVAE)的点云标记器被设计为生成包含有意义的本地信息的离散点令牌。然后,我们随机掩盖了一些输入点云的补丁并将它们送入骨干变压器。预训练目标是在销售器获得的点代币的监督下恢复蒙面地点的原始点令牌。广泛的实验表明,拟议的BERT风格的预训练策略显着提高了标准点云变压器的性能。配备了我们的预培训策略,我们表明,纯变压器架构对ModelNet40的准确性为93.8%,在ScanObjectnn的最艰难的设置上的准确性为83.1%,超越精心设计的点云模型,手工制作的设计更少。我们还证明,Point-Bert从新的任务和域中获悉的表示,我们的模型在很大程度上推动了几个射击点云分类任务的最先进。代码和预先训练的型号可在https://github.com/lulutang0608/pint -bert上获得
translated by 谷歌翻译
联合学习(FL)使移动边缘计算(MEC)中的设备能够在不上载本地数据的情况下协作培训共享模型。可以应用梯度压缩来缓解通信开销,但随着梯度压缩的流动仍然面临着巨大的挑战。为了部署绿色MEC,我们提出了Fedgreen,它通过细粒度梯度压缩增强了原始流体,以有效控制设备的总能耗。具体地,我们介绍了相关的操作,包括设备侧梯度减少和服务器侧元素 - 明智的聚合,以便于FL中的梯度压缩。根据公共数据集,我们研究了压缩的本地梯度对不同压缩比的贡献。之后,我们制定和解决学习精度 - 能效概率问题,其中为每个设备导出最佳压缩比和计算频率。实验结果表明,与基线方案相比,鉴于80%的测试精度要求,FedGreen减少了装置总能耗的至少32%。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译